14,914 research outputs found

    Adapting End-to-End Speech Recognition for Readable Subtitles

    Full text link
    Automatic speech recognition (ASR) systems are primarily evaluated on transcription accuracy. However, in some use cases such as subtitling, verbatim transcription would reduce output readability given limited screen size and reading time. Therefore, this work focuses on ASR with output compression, a task challenging for supervised approaches due to the scarcity of training data. We first investigate a cascaded system, where an unsupervised compression model is used to post-edit the transcribed speech. We then compare several methods of end-to-end speech recognition under output length constraints. The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities. Furthermore, the best performance in terms of WER and ROUGE scores is achieved by explicitly modeling the length constraints within the end-to-end ASR system.Comment: IWSLT 202

    Generalized liquid crystals: giant fluctuations and the vestigial chiral order of II, OO and TT matter

    Full text link
    The physics of nematic liquid crystals has been subject of intensive research since the late 19th century. However, because of the limitations of chemistry the focus has been centered around uni- and biaxial nematics associated with constituents bearing a D∞hD_{\infty h} or D2hD_{2h} symmetry respectively. In view of general symmetries, however, these are singularly special since nematic order can in principle involve any point group symmetry. Given the progress in tailoring nano particles with particular shapes and interactions, this vast family of "generalized nematics" might become accessible in the laboratory. Little is known since the order parameter theories associated with the highly symmetric point groups are remarkably complicated, involving tensor order parameters of high rank. Here we show that the generic features of the statistical physics of such systems can be studied in a highly flexible and efficient fashion using a mathematical tool borrowed from high energy physics: discrete non-Abelian gauge theory. Explicitly, we construct a family of lattice gauge models encapsulating nematic ordering of general three dimensional point group symmetries. We find that the most symmetrical "generalized nematics" are subjected to thermal fluctuations of unprecedented severity. As a result, novel forms of fluctuation phenomena become possible. In particular, we demonstrate that a vestigial phase carrying no more than chiral order becomes ubiquitous departing from high point group symmetry chiral building blocks, such as II, OO and TT symmetric matter.Comment: 14 pages, 5 figures; published versio

    Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

    Full text link
    Encoder-decoder models provide a generic architecture for sequence-to-sequence tasks such as speech recognition and translation. While offline systems are often evaluated on quality metrics like word error rates (WER) and BLEU, latency is also a crucial factor in many practical use-cases. We propose three latency reduction techniques for chunk-based incremental inference and evaluate their efficiency in terms of accuracy-latency trade-off. On the 300-hour How2 dataset, we reduce latency by 83% to 0.8 second by sacrificing 1% WER (6% rel.) compared to offline transcription. Although our experiments use the Transformer, the hypothesis selection strategies are applicable to other encoder-decoder models. To avoid expensive re-computation, we use a unidirectionally-attending encoder. After an adaptation procedure to partial sequences, the unidirectional model performs on-par with the original model. We further show that our approach is also applicable to low-latency speech translation. On How2 English-Portuguese speech translation, we reduce latency to 0.7 second (-84% rel.) while incurring a loss of 2.4 BLEU points (5% rel.) compared to the offline system

    Spin-dependent Klein tunneling in graphene: Role of Rashba spin-orbit coupling

    Get PDF
    Within an effective Dirac theory the low-energy dispersions of monolayer graphene in the presence of Rashba spin-orbit coupling and spin-degenerate bilayer graphene are described by formally identical expressions. We explore implications of this correspondence for transport by choosing chiral tunneling through pn and pnp junctions as a concrete example. A real-space Green's function formalism based on a tight-binding model is adopted to perform the ballistic transport calculations, which cover and confirm previous theoretical results based on the Dirac theory. Chiral tunneling in monolayer graphene in the presence of Rashba coupling is shown to indeed behave like in bilayer graphene. Combined effects of a forbidden normal transmission and spin separation are observed within the single-band n to p transmission regime. The former comes from real-spin conservation, in analogy with pseudospin conservation in bilayer graphene, while the latter arises from the intrinsic spin-Hall mechanism of the Rashba coupling.Comment: 10 pages, 10 figure
    • …
    corecore